Applications
Objectives
Methods
Interpretation
February 22, ’23
Applications
Objectives
Methods
Interpretation
aid in discovery of new populations of imperiled plants
aid in creation of reserves under climate change models
aid in predicting joint species distributions, i.e. obligate mutualisms
using known occurrences of a species, identify areas which have similar habitat and the potential to support populations
but, what about dispersal?
competition?
mutualisms?
Besseya (=Synthris) alpina (A. Gray) Rydberg.
American Basin
B. alpina, Franklin #3948
define spatial domain and grain
software environments
dependent variables
independent variables
modelling approaches
model evaluation
predicting a model into space
domain; spatial extent of study
- administrative boundary
- ecological model
grain; scales in space and time
- resolution at which process occurs (space)
- current and past climate (time)
- projected climates
- (animals) seasonal patterns?
limitation: compute power
Domain
R
grass gis many modules for creating predictors
qgis graphical user interface for mouse guided visualization
occurrences of a species in space (and time)
Linear models:
Occurence Records
explicitly check for variation
carefully encode categorical data
too much, may not be useful
too little, may not be useful
pilot knock out studies; use one variable leaving the others out
warrants simplifying a variable?
correlated!
“we are stronger together than we are alone” - Walter Payton
correlated!
all evaluation performed by computer – too much information
much more common approach than individual linear models
species distributions are generally too complex for individual predictors, and building fully interactive terms would take a long time.
the typical approach since the late 90’s
do the work for you
none, get a few observations, the more the merrier.
train/test split (partition data)
\[ Accuracy = \frac{\text{correct classifications}}{\text{all classifications }} \] \[ Sensitivity = \frac{\text{true positives}}{\text{true positives + false negatives }} \] probability of the method giving a positive result when the test subject is positive.
\[ Specificity = \frac{\text{true negatives}}{\text{true negatives + false positives }} \] probability of the method giving a negative result when the test subject is negative
prediction
Ensemble learning utilizes many sets of trees, each tree being composed of many binary decisions, to create a single model. Each independent variable ( - or feature) may become a node on the tree - i.e. a location on the tree where a binary decision will move towards a predicted outcome. Each of the decision tree models which ensemble learning utilizes is a weak model, each of which may suffer due to high variance or bias, but which produce better outcomes than would be expected via chance. When ensembled these models generate a strong model, a model which should have more appropriately balanced variance and bias and predicts outcomes which are more strongly correlated with the expected values than the individual weak models.
Random Forest (RF) the training data are continually bootstrap re-sampled, in combination with random subsets of features, to create nodes which attempt to optimally predict a known outcome. A large number of trees are then aggregated, via the most common predictions, to generate a final classification prediction tree. Each individual prediction tree is generated independently of the others.
Boosted Regression Tree (BRT) (or Gradient Boosted tree) An initial tree is grown, and all other trees are derived sequentially from it, as each new tree is grown the errors in responses from the last tree are weighed more heavily so that the model focuses on selecting dependent variables which refine predictions. All response data and predictor variables are kept available to all trees.
Hijmans, Robert J. 2022. Terra: Spatial Data Analysis. https://CRAN.R-project.org/package=terra.
Kuhn, Max. 2022. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.
Naimi, Babak, and Miguel B. Araujo. 2016. “Sdm: A Reproducible and Extensible r Platform for Species Distribution Modelling.” Ecography 39: 368–75. https://doi.org/10.1111/ecog.01881.
Pebesma, Edzer. 2018. “ Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.